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GENETIC METHOD 

The present invention relates to a method for the coexpression of two or more proteins 
in plants within a single transcription unit, to linker sequences for use in the method of the 
invention, to DNA constructs for use in the invention and to plants transformed with the 
constructs of the invention. 

For many applications based on genetic modification of plants by transgenesis, it is 
desirable to express co-ordinately two or more transgenes. For instance, coexpression in 
plants of transgenes encoding antimicrobial proteins with different biochemical targets can 
result in enhanced disease resistance levels, resistance against a broader range of pathogens, or 
resistance that is more difficult to overcome by mutational adaptation of pathogens. Other 
examples include those aimed at producing a particular metabolite in transgenic plants by 
coexpression of multiple transgenes that are involved in a biosynthetic pathway. There are 
different ways to obtain transgenic plants expressing multiple transgenes. One frequently 
chosen option is to introduce each transgene individually via separate transformation events 
and to cross the different single-transgene expressing lines. The drawback of this method is 
that the different transgenes in the resulting progeny will be inserted at different loci, which 
complicates the subsequent breeding process. Moreover, this method is not applicable to 
crops that are propagated vegetatively, such as for instance potato, many ornamentals and fruit 
tree species. A second possibility is to introduce the different transgenes as linked expression 
cassettes, each with their own promoters and terminators, within a single transformation 
vector. Such a set of transgenes will in this case segregate as a single genetic locus. It has 
been observed, however, that the presence of multiple copies of the same promoter within a 
transgenic plant often results in transcriptional silencing of the transgenes (Matzke, M. A. and 
Matzke, AJ.M., 1998, Cellular and Molecular Life Sciences 54, 94-103). In an attempt to 
introduce a vector containing four linked transgenes each driven by a CaMV35S promoter, 
Van den Elzen P.J. et aL (Phil. Trans. R. Soc. Lon. B., 1993. 342: 271-278) observed that none 
of the analysed transgenic lines expressed all four transgenes at a reasonably high level. To 
avoid this problem one could use different promoters for each of the expression cassettes used 
in the construct. However, there is currently only a very limited choice of promoter sets that 
have comparable characteristics in terms of expression levels, cell-type and developmental 
specificity and response to environmental factors. A third option would be to produce 
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multiple proteins from one transcription unit by separating the distinct coding regions by so- 
called internal ribosomal entry sites, which allow ribosomes to reiterate translation at internal 
positions within a mRNA species. Although internal ribosomal entry sites are well 
documented in animal systems (Kaminski A. et a/., 1994, Genet. Eng. 16, 1 15-155) it is not 
known at present whether such sites are also functional in nuclear-encoded genes from plants. 
Polycistronic genes can be expressed when inserted in plant chloroplastic genomes (Daniell H. 
et al. 9 1998, Nature Biotechnology 16, 345-348) but the gene products in this case are confined 
to the chloroplast, which is not always the preferred site of deposition of foreign proteins. A 
fourth strategy, finally, is based on the production of multiple proteins by proteolytic cleavage 
of a single polyprotein precursor encoded by a single transcription unit. Potyviruses, for 
instance, translate their genomic RNA into a single polyprotein precursor that encompasses 
proteolytic domains able to cleave the polyprotein precursor in cis (Dougherty, W.G. and 
Carrington, J.C., 1988, Annu. Rev. Phytopathol. 26, 123-143). Beck von Bodman, S. et aL, 
(1995, Bio>Technology 13, 587-591) have already exploited the potyviral system to co-express 
two enzymes involved in the biosynthesis of mannopine. The two biosynthetic enzymes were 
fused within one open reading frame together with a protease derived from a potyviral 
polyprotein precursor, and the adjoining regions were separated by 8 amino acids long spacers 
representing specific cleavage sites for the protease. The plants transformed with this 
construct synthesized mannopine, suggesting that the two enzymes had somehow been 
produced in a form that was at least partially functional, although direct evidence for the 
presumed cleavage events in planta was not presented. A disadvantage of this system is that a 
viral protein needs to be co-expressed with proteins of interest, which is not always desirable. 
More recently, Urwin P.E. et al. (1998, Planta 204, 472-479) have shown that it is possible to 
co-express two different proteinase inhibitors joined by a protease-sensitive propeptide derived 
from a plant metallothionein-like protein. A polyprotein precursor consisting of a cysteine 
protease inhibitor (oryzacystatin from vice), a propeptide from pea metallothionein-like 
protein and a serine protease inhibitor (cowpea trypsin inhibitor), was found to be cleaved in 
transgenic Arabidopsis thaliana plants. The cleavage, however, was only partial, as uncleaved 
polyprotein precursor could also be detected in the transgenic plants. As the polyprotein 
precursor did not contain a leader peptide, the translation products are predicted to be 
deposited in the cytosol. The metallothionein from which the propeptide was derived also does 
not contain a leader peptide (Evans IM 1990, FEBS Lett. 262, 29-32) and hence its processing 
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must occur in the cytosol. For some applications, cytosolic processing and deposition is a 
drawback. Many proteins, especially glycosylated proteins or proteins with multiple disulfide 
bridges, must be synthesized in the secretory pathway (encompassing the endoplasmic 
reticulum and Golgi apparatus) in order to be folded in a functional form (Bednarek and 
Raikhel 1992, Plant Mol. Biol. 20, 133-150). In addition, for some applications such as for 
instance the expression of antimicrobial proteins, the extracellular space is the preferred 
deposition site, as most microorganisms occur at least during the early stages of infection in 
the extracellular space. Proteins destined to the extracellular space are also synthesised via the 
secretory pathway but lack additional targeting information other than the leader peptide 
(Bednarek and Raikhel 1992, Plant Mol. Biol. 20, 133-150). 

The present invention provides a convenient and highly efficient method of co- 
expressing two or more proteins in a plant as a single transcription unit where the two proteins 
are joined by a cleavable linker, the construct being designed such that cleavage occurs in the 
secretory pathway of the plant thereby releasing the proteins extracellularly. 

According to the present invention there is provided a method for the expression of 
multiple proteins in a transgenic plant comprising inserting into the genome of said plant a 
DNA sequence comprising a promoter region operably linked to a signal sequence said signal 
sequence being operably linked to two or more protein encoding regions and a 3' -terminator 
region wherein said protein encoding regions are separated from each other by a DNA 
sequence coding for a linker propeptide said propeptide providing a cleavage site whereby the 
expressed polyprotein is post-translationally processed into the component protein molecules. 

The two or more protein encoding regions according to all aspects of the invention 
preferably do not encode identical proteins i.e. the method of the invention allows the 
production of different proteins in a single transcription unit. The DNA sequence to be 
expressed according to the method of the invention is one which does not occur naturally in 
the plant used for the production of the multiple proteins i.e. one or more of the components of 
the DNA sequence will be heterologous to the plant host. 

The method for the expression of multiple proteins described herein does not cover the 
use of a linker propeptide derived from the lb- AMP gene as described in SEQ ID Nos 14,15, 
16, 17 or 18 of Published International Patent Application No. WO 95/24486 separating three 
protein encoding regions each of which encodes Rs-AFP2 and the insertion thereof into a plant 
genome. 
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Accordingly, the present invention there is provided a method for the expression of 
multiple proteins in a transgenic plant comprising inserting into the genome of said plant a 
DNA sequence comprising a promoter region operably linked to a signal sequence said signal 
sequence being operably linked to two or more protein encoding regions and a 3' -terminator 
5 region wherein said protein encoding regions are separated from each other by a DNA 

sequence coding for a linker propeptide said propeptide providing a cleavage site whereby the 
expressed polyprotein is post-translationally processed into the component protein molecules 
with the proviso that when the linker propeptide is derived from the lb- AMP gene as described 
in SEQ ID Nos 14,15, 16, 17 or 18 of Published International Patent Application No. WO 
10 95/24486 it does not separate three protein encoding regions each of which encodes Rs-AFP2. 

The sequence of Rs-AFP2 is fully described in Published International patent 
Application no. WO 93/05153 published 18 March 1993. 

As used herein the term signal sequence is used to define a sequence encoding a leader 
peptide that allows a nascent polypeptide to enter the endoplasmic reticulum and is removed 
15 after this translocation. 

The signal sequence may be derived from any suitable source and may for example be 
naturally associated with the promoter to which it is operably linked. We have found the use 
of signal sequences from the class of plant proteins known as defensins (Broekaert et al, 1995 
Plant Physiol 108, 1353-1358; Broekaert et al, 1997, Crit, Rev, Plant Sci. 16, 297-323) to be 
20 particularly suitable for use in the method of the invention. 

The promoter sequence may for example be that naturally associated with the signal 
sequence, and/or it may be that naturally associated with the protein encoding sequence to 
which it is linked, or it may be any other promoter sequence conferring transcription in plants. 
It may be a constitutive promoter or it may be an inducible promoter. 
25 The linker propeptide for use in all aspects and embodiments of the invention 

described herein is preferably a linker propeptide which is cleaved on passage of said DNA 
encoding the polyprotein precursor through the secretory pathway of the plant cells in which 
the polyprotein -encoding DNA is expressed. The linker propeptide is preferably designed or 
chosen such that cleavage of the propeptide occurs by proteases which are naturally present in 
30 the secretory pathway of the plant cell in which the DNA encoding the polyprotein is 
expressed. 



PPD 50378/GB 



-5- 



In a preferred embodiment the invention therefore provides a method for the 
expression of multiple proteins in a transgenic plant comprising inserting into the genome of 
said plant a DNA sequence comprising a promoter region operably linked to a signal sequence 
said signal sequence being operably linked to two or more protein encoding regions and a 3'- 
terminator region wherein said protein encoding regions are separated from each other by a 
DNA sequence coding for a linker propeptide said propeptide providing a cleavage site 
whereby the expressed polyprotein is post-translationally processed into the component protein 
molecules said linker propeptide being cleaved on passage of said DNA encoding the 
polyprotein precursor through the secretory pathway of the plant cells in which the polyprotein 
-encoding DNA is expressed. 

The method for the expression of multiple proteins described herein in all its 
embodiments does not cover the use of a linker propeptide derived from the lb- AMP gene as 
described in SEQ ID Nos 14,15, 16, 17 or 18 of Published International Patent Application 
No. WO 95/24486 separating three protein encoding regions each of which encodes Rs-AFP2 
and the insertion thereof into a plant genome. 

In a particularly preferred embodiment the invention provides a method for the 
expression of multiple proteins in a transgenic plant comprising inserting into the genome of 
said plant a DNA sequence comprising a promoter region operably linked to a signal sequence 
said signal sequence being operably linked to two or more protein encoding regions and a 3'- 
terminator region wherein said protein encoding regions are separated from each other by a 
DNA sequence coding for a linker propeptide said propeptide providing a cleavage site 
whereby the expressed polyprotein is post-translationally processed into the component protein 
molecules said linker propeptide being cleaved on passage of said DNA encoding the 
polyprotein precursor through the secretory pathway of the plant cells in which the polyprotein 
-encoding DNA is expressed wherein cleavage of the propeptide occurs by proteases which are 
naturally present in the secretory pathway of said plant cell. 

The linker propeptide may be a peptide which naturally contains processing sites for 
proteases occuring in the secretory pathway of plants such as the internal propeptides derived 
from the lb- AMP gene which are described further herein, or may be a peptide to which such a 
protease processing site has been engineered at either or both ends thereof to facilitate 
cleavage of the sequence. Where a propeptide possesses one such protease processing site a 
further protease processing site may be added. For example, as described fully herein, a 
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further protease processing site has been added to the 3' end of the DNA sequence coding for 
the C-terminal propeptides from Dahlia and Amaranthus which naturally possess a protease 
processing site at their N-terminal end for an unknown secretory pathway protease and these 
peptides are particularly suitable for use according to the method of the invention. 
5 The linker propeptide according to the invention is preferably not derived from a virus. 

In the present invention, we have developed two novel strategies for making artificial 
polyprotein precursors which are cleaved in the secretory pathway. The first one was based on 
the use of a propeptide derived from the lb AMP gene. Ib AMP is a gene from the plant 
Impatiens balsamina which encodes a peculiar polyprotein precursor featuring a leader peptide 

10 and six consecutive antimicrobial peptides, each flanked by propeptides ranging from 16 to 28 
amino acids in length (Tailor R.H. et aL, 1997, J. Biol. Chem. 272, 24480-24487). It is not 
known how and where processing of the IbAMP precursor occurs in its plant of origin. One of 
the internal propeptides from IbAMP was used to separate two distinct plant defensin coding 
regions, one originating from radish seed (RsAFP2, Terras F.R.G. et aL, 1992, J. Biol. Chem. 

15 267, 15301-15309; Terras et al 1995 Plant Cell, 7, 573-588) and one from dahlia seed 

(DmAMPl, Osborn R.W. et aL, 1995, FEBS Lett. 368, 257-262). The other strategy was 
based on the use of C-terminal propeptides from either the DmAMPl precursor or the 
AcAMP2 precursor (De Bolle M.F.C. et aL, 1993, Plant Mol. Biol. 22, 1 187-1 190). These C- 
terminal propeptides were chosen based on our previous observation that they apparently can 

20 be cleaved in transgenic tobacco plants without influencing extracellular deposition of the 
mature proteins to which they are connected in the precursor (R.W. Osborn and S. 
Attenborough, personal communication; De Bolle M.F.C. et aL, 1996, Plant Mol. Biol. 31, 
993-1008) implicating that such cleavage is performed by a protease present in the secretory 
pathway excluding the vacuole. To convert these C-terminal propeptides to internal 

25 propeptides, a subtilisin-like protease processing site was engineered at the C-terminal part of 
the propeptides. Subtilisin-like proteases are enzymes that specifically cleave at recognition 
sites of which the last two residues are basic (Barr, P.J., 1991, Cell 66, 1-3; Park CM. et aL, 
1994, Mol. Microbiol. 11, 155-164). Although subtilisin-like proteases are best documented 
in fungi (e.g. Kex2-like proteases) and higher animals (e.g. furins), recent evidence suggests 

30 that such enzymes are also present in plants (Kinal H. et aL, 1995, Plant Cell 7, 677-688; 

Tornero P. et aL, 1997, J. Biol. Chem. 272, 14412-14419), including Arabidopsis (Ribeiro A. 
et aL, 1995, Plant Cell 7, 785-794). 
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We have found that polyprotein precursors consisting of a leader peptide followed by 
two different plant defensins separated from each other by any of the above described internal 
propeptides can be processed in transgenic plants to release both plant defensins 
simultaneously. The cleavage does occur such that at least the major part of the plant 
defensins are deposited in the extracellular space. Hence processing of the precursor occurred 
either in the secretory pathway or in the extracellular space. The different propeptides shown 
to be cleaved in the transgenic plants do not reveal primary sequence homology. However, the 
sequences all appear to be rich in the small amino acids A, V, S and T and all contain 
dipeptidic sequences consisting of either two acidic residues, two basic residues or one acidic 
and one basic residue. Although propeptide cleavage in the examples shown in this invention 
did apparently not occur within vacuoles, internal propeptides from vacuolar proteins (e.g. 2S 
albumins) might also be used if vacuolar deposition of the proteins would be desirable. In the 
co-expression experiments described here two different plant defensins were used but it is 
predicted that similar results will be obtained when other types of proteins would be used or 
when more than two mature protein domains would be used in the polyprotein precursor 
structure. 

Where it is desired to target the polyprotein to a particular cellular organelle along the 
secretory pathway a suitable targeting sequence may be added to one or more of the multiple 
protein encoding regions. For example, an endoplasmic reticulum targeting sequence such as 
that encoding KDEL may be added to the 3' end of one or more of the mature protein 
encoding regions, or a vacuolar targeting sequence (Chispeels and Raikhel 1992, Cell 68, 613- 
616) can be added to the 3' or 5' end of one or more of the protein encoding regions. An 
example of the latter is the barley lectin carboxy-terminal propeptide which has been shown to 
destine heterologous proteins that are otherwise secreted to the vacuoles (Bednarek and 
Raikhel 1991, Plant Cell 3, 1195-1206; De Bolle et al, 1996 Plant Mol. Biol. 31, 993-1008). 

At least 40% of the sequence of the linker propeptide for use in accordance with all 
aspects and methods of the invention as described herein preferably consists of stretches of 
either two to five consecutive hydrophobic residues selected from alanine, valine, isoleucine, 
methionine, leucine, phenylalanine, tryptophan and tyrosine or stretches of two to five 
hydrophilic residues selected from aspartic acid, glutamic acid, lysine, arginine, histidine, 
serine, threonine, glutamine and asparagine. 
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The said hydrophobic residues are preferably alanine, valine, leucine, methionine 
and/or isoleucine and the said hydrophilic residues are preferably aspartic acid, glutamic acid, 
lysine and/or arginine. 

It is further preferred that the linker propeptide has within 7 residues of its N- or C- 
5 terminal cleavage site a sequence with two to five consecutive acidic residues, two to five 
basic residues or two to five consecutive intermixed acidic and basic residues. 

It is especially preferred that at least 40% of the sequence of the linker propeptide for 
use in accordance with all aspects of the invention as described herein preferably consists of 
stretches of either two to five consecutive hydrophobic residues selected from alanine, valine, 
10 isoleucine, methionine, leucine, phenylalanine, tryptophan and tyrosine or stretches of two to 
five hydrophilic residues selected from aspartic acid, glutamic acid, lysine, arginine, histidine, 
serine, threonine, glutamine and asparagine and has within 7 residues of its N- or C- terminal 
cleavage site a sequence with two to five consecutive acidic residues, two to five basic 
residues or two to five consecutive intermixed acidic and basic residues. 
15 The use of linker propeptides rich in the small amino acids A, V, S and T and 

containing dipeptidic sequences consisting of either two acidic residues, two basic residues or 
one acidic and one basic residue which on translation provides a cleavage site whereby the 
expressed polyprotein is post-translationally processed into the component protein molecules 
is also preferred. 

20 As used herein the term 'rich' is used to denote that the residues A,V, S and T are 

present more frequently than would be expected based on a random distribution of amino 
acids. 

It is further preferred that the linker propeptides have a dipeptidic sequence within 
seven amino acids from the N- and/or C- terminal ends thereof, the said dipeptidic sequences 
25 consisting of either two acidic residues, two basic residues or an acidic and a basic residue 
wherein said dipeptidic sequences may be the same or different at each terminus. 

In a further preferred embodiment said dipeptidic sequences are selected from the 
following EE, ED and/or KK. 

It is particularly desirable that the linker propeptide should hold the two (or more) 
30 protein domains sufficiently far apart so that they can fold appropriately and independently. It 
is further advantageous that the linker propeptide should not interact with any secondary 
structural element in the two proteins which it links and should therefore itself have no 
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particular secondary structure or form a solitary secondary structure element such as an alpha 
helix. 

In this and all other aspects and embodiments of the invention described herein the 
linker propeptide sequence providing the cleavage site is preferably isolatable from a plant 
protein, more preferably from the precursor of a plant antimicrobial protein such as a defensin. 
or a hevein-type antimicrobial peptide (Broekaert et al 1997, Crit. Rev. Plant Sci. 16, 297- 
323). The linker propeptide is most preferably derivable from a defensin and/or a hevein type 
antimicrobial peptide, especially from the C-terminal propeptides from Dm- AMP 1 and Ac- 
AMP2 the sequences of which are as described in Figure 2 herein. 

The use of a linker propeptide derived from an antimicrobial peptide derived from the 
genus Impatiens is also preferred. The lb- AMP gene comprises five propeptide regions all of 
which are suitable for use in the present invention and which are described fully in Published 
International Patent Application WO 95/24486 at pages 29 and 40 to 42, the contents of 
which are incorporated herein by reference. All or part of the C-terminal propeptides derived 
from the Dm- AMP and Ac- AMP gene may be used. 

According to a preferred embodiment the present invention further provides a method 
for the expression of multiple proteins in a transgenic plant comprising inserting into the 
genome of said plant a DNA sequence comprising a promoter region operably linked to a 
signal sequence said signal sequence being operably linked to two or more protein encoding 
regions and a 3' -terminator region wherein said protein encoding regions are separated from 
each other by a DNA sequence coding for a linker propeptide wherein the linker propeptide is 
derivable from a defensin and/or a hevein type antimicrobial peptide said propeptide providing 
a cleavage site whereby the expressed polyprotein is post-translationally processed into the 
component protein molecules. 

The use of the C-terminal propeptides from Dm- AMP 1 and Ac-AMP2 as described in 
Figure 2 herein as cleavable linkers i.e. to provide a cleavable linkage site, are particularly 
preferred. Depending on the choice of propeptide it may be necessary to engineer an 
additional specific protease recognition site at either or both ends to facilitate cleavage of the 
sequence. Suitable specific protease recognition sites include for example, recognition sites 
for subtilisin -like proteases recognising either a dipeptidic sequence consisting of two basic 
residues; tetrapeptidic sequences consisting of a hydrophobic residue, any residue, a basic 
residue and a basic residue or a tetrapeptidic sequence consisting of a basic residue, any 
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residue, a basic residue and a basic residue. Subtilisin-like protease recognition sites are 
particularly preferred for use in the method of the invention. 

According to a yet further preferred embodiment the present invention further provides 
a method for the expression of multiple proteins in a transgenic plant comprising inserting into 
5 the genome of said plant a DNA sequence comprising a promoter region operably linked to a 
signal sequence said signal sequence being operably linked to two or more protein encoding 
regions and a 3' -terminator region wherein said protein encoding regions are separated from 
each other by a DNA sequence coding for a linker propeptide said propeptide providing a 
cleavage site whereby the expressed polyprotein is post-translationally processed into the 
10 component protein molecules and wherein an additional specific protease recognition site has 
been engineered at either or both ends of said linker propeptide to facilitate cleavage of the 
sequence. 

According to a yet further preferred embodiment the present invention further provides 
a method for the expression of multiple proteins in a transgenic plant comprising inserting into 

15 the genome of said plant a DNA sequence comprising a promoter region operably linked to a 
signal sequence said signal sequence being operably linked to two or more protein encoding 
regions and a 3' -terminator region wherein said protein encoding regions are separated from 
each other by a DNA sequence coding for a linker propeptide wherein the linker propeptide is 
derivable from a defensin and/or a hevein type antimicrobial peptide said propeptide providing 

20 a cleavage site whereby the expressed polyprotein is post-translationally processed into the 
component protein molecules and wherein an additional specific protease recognition site has 
been engineered at either or both ends of said linker propeptide to facilitate cleavage of the 
sequence. 

The invention further provides the use of propeptides isolatable from plant derived 
25 proteins as cleavable linkers in polyprotein precursors synthesised via the secretory pathway in 
transgenic plants. The propeptides are preferably isolatable from the precursor of a plant 
defensin or a hevein-type antimicrobial peptide (Broekaert et al 1997, Crit. Rev. Plant Sci. 16, 
297-323). The propeptides may also preferably be isolatable from an antimicrobial peptide 
derived from the genus Impatiens. 
30 In a further aspect the invention provides the use of a propeptide wherein at least 40% 

of the sequence of the propeptide consists of stretches of either two to five consecutive 
hydrophobic residues selected from alanine, valine, isoleucine, methionine, leucine, 
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phenylalanine, tryptophan and tyrosine or stretches of two to five hydrophilic residues selected 
from aspartic acid, glutamic acid, lysine, arginine, histidine, serine, threonine, glutamine and 
asparagine as a cleavable linker in polyprotein precursors synthesised via the secretory 
pathway in transgenic plants. 

It is further preferred that the linker propeptide has within 7 residues of its N- or C- 
terminal cleavage site a sequence with two to five consecutive acidic residues, two to five 
basic residues or two to five consecutive intermixed acidic and basic residues. 

It is especially preferred that at least 40% of the sequence of the linker propeptide 
consists of stretches of either two to five consecutive hydrophobic residues selected from 
alanine, valine, isoleucine, methionine, leucine, phenylalanine, tryptophan and tyrosine or 
stretches of two to five hydrophilic residues selected from aspartic acid, glutamic acid, lysine, 
arginine, histidine, serine, threonine, glutamine and asparagine and has within 7 residues of its 
N- or C- terminal cleavage site a sequence with two to five consecutive acidic residues, two to 
five basic residues or two to five consecutive intermixed acidic and basic residues. 

In a further aspect the invention provides the use of a peptide sequence rich in the 
small amino acids A, V, S and T and containing dipeptidic sequences consisting of either two 
acidic residues, two basic residues or one acidic and one basic residue as a cleavable linker 
sequence wherein said sequence is isolatable from a plant defensin or a hevein-type 
antimicrobial protein. 

The methods of the invention may be used to achieve efficient expression and secretion 
of any desired proteins and is particularly suitable for the expression of proteins which must 
naturally be synthesised in the secretory pathway in order to be folded in a functional form 
such as, for example, glycosylated proteins and those with disulphide bridges. Additionally, it 
is extremely advantageous for proteins involved in the defence of a plant to attack by a 
pathogen to be secreted efficiently to the extracellular space since this is usually the initial site 
of pathogen attack and the present methods of the invention provide an effective means of 
delivering multiple proteins extracellularly. 

The method of the invention is also particularly suitable for producing small peptides 
which may then be used for immunisation purposes i.e. the transgenic plant or a seed derived 
therefrom may be used directly as a foodstuff thereby passively immunising the recipient. 

Examples of proteins which may be expressed according to the methods of the present 
invention include, for example, antifungal proteins described in Published International Patent 
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Application Nos W092/15691, W092/21699, WO93/05153, WO93/04586, W094/11511, 
WO95/04754, W095/18229, W095/24486, W097/21814 and W097/21815 including Rs- 
AFP1, Rs-AFP2, Dm-AMPl, Dm-AMP2, Hs-AFPl, Ah-AMPl, Ct-AMPl, Q-AMP2, Bn- 
AFP1, Bn-AFP2, Br-AFPl, Br-AFP2, Sa-AFPl, Sa-AFP2, Cb-AMPl, Cb-AMP2, Ca-AMPl, 
Bm-AMPl, Ace-AMPl, Ac-AMPl, Ac-AMP2, Mj-AMPl, Mj-AMP2, Ib-AMPl, Ib-AMP2, 
Ib-AMP3, Ib-AMP4, PR-1 type proteins such as chitinases, glucanases such as beta 1,3 and 
betal,6 glucanases, chitin-binding lectins, zeamatins, osmotins, thionins and ribosome- 
inactivating proteins and peptides derived therefrom or antifungal proteins showing 85% 
sequence identity, preferably greater than 90% sequence identity, more preferably greater than 
95% sequence identity with any of said proteins. 

In the context of the present invention, two amino acid sequences with at least 
85% similarity to each other have at least 85% similar (identical or conservatively 
replaced) amino acid residues in a like position when aligned optimally allowing for up to 
3 gaps, with the proviso that in respect of the gaps a total of not more than 15 amino acid 
residues is affected. Likewise, two amino acid sequences with at least 90% similarity to 
each other have at least 90% identical or conservatively replaced amino acid residues in a 
like position when aligned optimally allowing for up to 3 gaps with the proviso that in 
respect of the gaps a total of not more than 15 amino acid residues is affected. 

For the purpose of the present invention, a conservative amino acid is defined as 
one which does not alter the activity/function of the protein when compared with the 
unmodified protein. In particular, conservative replacements may be made between amino 
acids within the following groups: 

(i) Alanine, Serine, Glycine and Threonine 

(ii) Glutamic acid and Aspartic acid 

(iii) Arginine and Lysine 

(iv) Isoleucine, Leucine, Valine and Methionine 

(v) Phenylalanine, Tyrosine and Tryptophan 

Sequence similarity may be calculated using sequence alignment algorithms known in 
the art such as, for example, the Clustal Method described by Myers and Miller (Comput. 
Appl. Biosci .4 1 1-17 (1988).) and Wilbur and Lipman (Proc. Natl. Acad. Sci. USA 80, 726- 
30 (1983) ) and the Watterman and Eggert method (The Journal of Molecular Biology (1987) 
197, 723-728). The MegAlign Lipman Pearson one pair method (using default parameters) 
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which may be obtained from DNAstar Inc, 1228 Selfpark Street, Madison, Wisconsin, 53715, 
USA as part of the Lasergene system may also be used. 

The cleavable linkers are used to join two or more proteins of interest and provide 
cleavage sites whereby the polyprotein is post-translationally processed into the component 
protein molecules. 

In a further aspect the invention provides a DNA construct comprising a DNA 
sequence comprising a promoter region operably linked to a plant derived signal sequence said 
signal sequence being operably linked to two or more protein encoding regions and a 3'- 
terminator region wherein said protein encoding regions are separated from each other by a 
DNA sequence coding for a linker propeptide said propeptide providing a posi-translational 
cleavage site. 

The invention does not extend to the use of a DNA construct in the expression of 
multiple proteins in a transgenic plant where when said propeptide linker is derived from the 
Ib-AMP gene as described in SEQ IDs 14, 15, 16, 17 or 18 of published International patent 
Application no. WO 95/24486 said protein encoding regions encode only three copies of Rs- 
AFP2. 

In a preferred embodiment of this aspect the invention provides a DNA construct 
wherein said DNA sequence encoding said linker propeptide encodes an internal propeptide 
from the Ib-AMP gene. In a further preferred embodiment of this aspect the invention 
provides a DNA construct wherein said DNA sequence encoding said linker propeptide 
encodes the C-terminal propeptide from the Dm-AMP or from the Ac-AMP gene. 

In a particularly preferred embodiment the invention provides a DNA construct as 
described above wherein when the DNA sequence encoding the linker propeptide is derived 
from the Dm-AMP gene or from the Ac-AMP gene it additionally comprises one or more 
protease recognition sites at either or both ends thereof. 

In a further aspect the invention provides a DNA construct comprising a DNA 
sequence comprising a promoter region operably linked to two or more protein encoding 
regions and a 3' terminator-region wherein said protein encoding regions are separated from 
each other by a DNA sequence coding for a linker propeptide encoding the C-terminal 
propeptide from the Dm-AMP gene or the from the Ac-AMP gene said propeptide providing a 
post-translational cleavage site. 
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In a particularly preferred embodiment the invention provides a DNA construct as 
described above wherein the DNA sequence encoding the linker propeptide from Dm-AMP or 
Ac- AMP additionally comprises one or more protease recognition sites at either or both ends 
thereof. 

In a yet further aspect the invention provides a transgenic plant transformed with a 
DNA construct according to any of the above aspects of the invention. 

In a further aspect the invention provides a transgenic plant transformed with a DNA 
sequence comprising a promoter region operably linked to a signal sequence said signal 
sequence being operably linked to two or more protein encoding regions and a 3'-terminator 
region wherein said protein encoding regions are separated from each other by a DNA 
sequence coding for a linker propeptide which on translation provides a cleavage site. 

The invention does not extend to a transgenic plant where when the protein encoding 
regions are separated by a linker propeptide derived from the Impatiens gene as described in 
SEQ ID Nos. 14, 15, 16, 17 or 18 of published International Patent application No. WO 
95/24486 they encode three copies of the Rs-AFP2 protein. 

In a preferred embodiment of this aspect at least 40% of the sequence of the said linker 
propeptide consists of stretches of either two to five consecutive hydrophobic residues selected 
from alanine, valine, isoleucine, methionine, leucine, phenylalanine, tryptophan and tyrosine 
or stretches of two to five hydrophilic residues selected from aspartic acid, glutamic acid, 
lysine, arginine, histidine, serine, threonine, glutamine and asparagine. 

The said hydrophobic residues are preferably alanine, valine, leucine, methionine 
and/or isoleucine and the said hydrophilic residues are preferably aspartic acid, glutamic acid, 
lysine and/or arginine. 

It is further preferred that the linker propeptide has within 7 residues of its N- or C- 
terminal cleavage site a sequence with two to five consecutive acidic residues, two to five 
basic residues or two to five consecutive intermixed acidic and basic residues. 

It is especially preferred that at least 40% of the sequence of the linker propeptide 
consists of stretches of either two to five consecutive hydrophobic residues selected from 
alanine, valine, isoleucine, methionine, leucine, phenylalanine, tryptophan and tyrosine or 
stretches of two to five hydrophilic residues selected from aspartic acid, glutamic acid, lysine, 
arginine, histidine, serine, threonine, glutamine and asparagine and has within 7 residues of its 
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N- or C- terminal cleavage site a sequence with two to five consecutive acidic residues, two to 
five basic residues or two to five consecutive intermixed acidic and basic residues. 

In a further preferred embodiment of this aspect of the invention the DNA sequence 
providing the cleavage site encodes a peptide sequence rich in the small amino acids A, V, S 
and T and containing dipeptidic sequences consisting of either two acidic residues, two basic 
residues or one acidic and one basic residue. 

In a particularly preferred embodiment of this aspect of the invention the DNA 
sequence providing the cleavage site encodes a propeptide derived from the lb- AMP gene such 
as for example that described in Figure 2. In a further particularly preferred embodiment of 
this aspect of the invention the DNA sequence providing the cleavage site encodes the C- 
terminal propeptides from Dm- AMP 1 and Ac-AMP2 as described in Figure 2 which may 
optionally be engineered to include a further DNA sequence encoding a subtilisin-like protease 
recognition site. 

In a further aspect the invention provides a vector comprising a DNA construct as 
described above. 

Unexpectedly, expression levels of plant defensins in plants transformed with a 
polyprotein precursor construct were found to be much higher compared to those in plants 
transformed with single plant defensin constructs. Hence, the processing system described 
here can be used not only to co-express two or more different proteins, but also to obtain 
higher expression levels of a protein, particularly of small proteins. The reason for the 
observed stimulatory effect on translational efficiency is currently unclear. It might be due to 
an effect of mRNA length or length of primary translation product on translational efficiency. 

In a further aspect the invention therefore provides a method of improving expression 
levels of a protein in a transgenic plant comprising inserting into the genome of said plant a 
DNA sequence comprising a promoter region operably linked to two or more protein encoding 
regions and a 3' -terminator region wherein said protein encoding regions are separated from 
each other by a DNA sequence coding for a linker propeptide said propeptide providing a 
cleavage site whereby the expressed polyprotein is post-translationally processed into the 
component protein molecules. 

The method for improving the expression level of proteins described herein does not 
cover the use of a linker propeptide derived from the lb- AMP gene as described in SEQ ID nos 
14,15,16,17 or 18 described in Published International patent application no. WO 95/24486 
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separating three protein encoding regions each of which encodes Rs-AFP2 and the insertion 
thereof into a plant genome. 

In a further preferred embodiment of this aspect there is provided a method of 
improving expression levels of a protein in a transgenic plant comprising inserting into the 
genome of said plant a DNA sequence comprising a promoter region operably linked to a 
signal sequence said signal sequence being operably linked to two or more protein encoding 
regions and a 3' -terminator region wherein said protein encoding regions are separated from 
each other by a DNA sequence coding for a linker propeptide said propeptide providing a 
cleavage site whereby the expressed polyprotein is post-translationally processed into the 
component protein molecules. 

This method of the invention is particularly suitable for the expression of proteins 
which are 100 amino acids or less in length. 

As will be readily apparent to a man skilled in the art the sequence of the individual 
components of the DNA sequence i.e. the signal sequence, promoter sequence, linker 
sequence, protein sequence(s), terminator sequence for use in the methods according to the 
invention may be predicted from its known amino acid sequence and DNA encoding the 
protein may be manufactured using a standard nucleic acid synthesiser. Alternatively, DNA 
encoding the components of the invention may be produced by appropriate isolation from 
natural sources. 

The invention is further illustrated with reference to the following non-limiting 
examples and figures in which 

Figure 1: shows nucleotide sequence and corresponding amino acid sequence of coding 
region of the DmAMPl gene. The amino acids corresponding to mature DmAMPl are 
underlined. The nucleotides corresponding to the intron are double underlined. 
Figure 2: shows schematic representation of the coding regions from the vector constructs. 
Amino acids sequences below the internal propeptides represent the propeptide sequences 
from which the linker propeptides were derived. 

Figure 3: shows schematic representation of plant transformation vector pFAJ3105 
Figure 4: shows schematic representation of plant transformation vector pFAJ3 106 
Figure 5: shows schematic representation of plant transformation vector pFAJ3 107 
Figure 6: shows schematic representation of plant transformation vector pFAJ3 108 
Figure 7: shows schematic representation of plant transformation vector pFAJ3 109 
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Figure 8: shows nucleotide sequence and corresponding amino acid sequence of the open 
reading frame of the region comprised between the Ncol and Sacl sites of plasmid pFAJ3105. 
The amino acids corresponding to mature DmAMPl and mature RsAFP2 are underlined and 
double-underlined, respectively. 

Figure 9: shows nucleotide sequence and corresponding amino acid sequence of the open 
reading frame of the region comprised between the Ncol and Sacl sites of plasmid pFAJ3 106. 
The amino acids corresponding to mature DmAMPl and mature RsAFP2 are underlined and 
double-underlined, respectively. 

Figure 10: shows nucleotide sequence and corresponding amino acid sequence of the open 
reading frame of the region comprised between the Ncol and Sacl sites of plasmid pFAJ3107. 
The amino acids corresponding to mature DmAMPl and mature RsAFP2 are underlined and 
double-underlined, respectively. 

Figure 1 1 : shows nucleotide sequence and corresponding amino acid sequence of the open 
reading frame of the region comprised between the Ncol and Sacl sites of plasmid pFAJ3108. 
The amino acids corresponding to mature DmAMPl and mature RsAFP2 are underlined and 
double-underlined, respectively. 

Figure 12: shows nucleotide sequence and corresponding amino acid sequence of the open 
reading frame of the region comprised between the Ncol and Sacl sites of plasmid pFAJ3 109. 
The amino acids corresponding to mature DmAMPl are underlined. 

Figure 13: shows the Dm- AMP 1 expression levels (as % of total soluble protein) of a series 
of transgenic individual plants transformed with construct pFAJ3 105 and a series of transgenic 
individuals transformed with construct pFAJ3 109. 

Figure 14: shows RP-HPLC analysis on a C8-silica column of crude extracts from leaves 
transformed with construct pFAJ3105 (A) or pFAJ3 106 (B). Extracts were prepared as 
described in Materials and Methods. The column was eluted with a gradient of acetonitrile in 
0.1 % TFA (0-35 min. 15 % - 50 % acetonitrile in 0.1 % TFA). The eluate was monitored on- 
line for measurement of the absorbance at 2 14 nm (top trace), fractionated, and subjected to 
Elisa assays for DmAMPl (lower bar graph, black bars) and RsAFP2 (lower bar graph, white 
bars). The elution position of authentic DmAMPl and RsAFP2 are indicated with arrows on 
the A 2 j4 chromatograms. 

Figure 15 shows: RPC of the extracellular fluid fraction of Arabidopsis plants transformed 
with construct 3105 (line 14). RPC was performed on a C8-silica column (Microsorb-MV, 4.6 
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x 250 mm, Rainin) equilibrated with OA % trifluoroacetic acid (TFA). After loading the 
column was eluted at a flow rate of 1 ml/min for 20 min with 0.1 % TFA, whereafter a 35 min 
linear gradient was applied from 15 to 50 % acetonitrile in 0.1 % TFA. Absorbance (full line) 
was measured on-line at 280 nm and acetonitrile concentration (dashed line) was measured on- 
5 line with a conductivity monitor. Fractions were collected and assessed for DmAMPl-CRP 
and RsAFP2-CRP using ELISA assays. Peak numbers in bold indicate presence of DmAMPl- 
CRP, peak numbers in italic indicate presence of RsAFP2-CRP. 

Figure 16 shows: RPC of an extract of Arabidopsis plants transformed with construct 3105 
(line 14). Samples were two different fractions from IEC showing presence of either 

10 DmAMPl-CRPs or RsAFP2-CRPs, namely those fractions eluting between 0.17 - 0.33 M 
NaCl (A), and 0.33 - 0.49 M NaCl (B). RPC was performed as in the legend to Figure 14. 
Absorbance (full line) was measured on-line at 280 nm and acetonitrile concentration (dashed 
line) was measured on-line with a conductivity monitor. Fractions were collected and assessed 
for DmAMPl-CRP or RsAFP2-CRP using ELISA assays. Peak numbers in bold indicate 

15 presence of DmAMPl-CRP, peak numbers in italic indicate presence of RsAFP2-CRP. 

MVN VS GELC . . . .FyC SNAAD^VATPEDVE^G OKL. . .FPC 

Figure 17 shows the amino acid sequence of the polyprotein precursors encoded by construct 
3105. Dashes indicate omission from the full sequence for sake of brevity. The sequence in 
italic is the DmAMPl leader peptide, the underlined sequence is mature DmAMPl, the bold 
20 sequence is the internal propeptide, the double underlined sequence is mature RsAFP2. 
Arrows indicate processing sites according to the N-terminal sequence and MALDI-TOF 
analyses of purified DmAMP-CRPs and RsAFP2-CRPs. 



25 EXAMPLES 

MATERIALS AND METHODS 

Cloning of DmAMPl cDNA and DmAMPl gene 

Cloning procedures and polymerase chain reaction (PGR) procedures were performed 
following standard protocols (Sambrook et al., 1989, Molecular Cloning: a laboratory manual, 
30 2 nd edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY). A cDNA library 
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was constructed from near-dry seeds collected from flowers of Dahlia merckii. Total RNA 
was purified from the seeds using the method of Jepson I. et al. (1991, Plant Mol. Biol. 
Reporter 9, 131-138). 0.6 mg of total RNA was obtained from 2 g of D. merckii seed. 
PolyATract magnetic beads (Promega) were used to isolate approximately 2 ug poly-A+ RNA 
from 0.2 mg of total RNA. 

The poly-A+ RNA was used to construct a cDNA library using a ZAP-cDNA synthesis kit 
(Stratagene). Following first and second strand synthesis, cDNAs were ligated with vector 
DNA. After phage assembly using Gigapack Gold (Stratagene) packaging extracts, 
approximately 1 x 10 5 plaque forming units (pfu) were obtained. 

Using oligonucleotides AFP-5 (5 , -TG(T,C)GANAANGCN(A,T)(G,C)NAA(A,G)ACNTGG) 
based on the N-terminal sequence CEKASKTW of DmAMPl, Osborn R.W. et al, 1995, 
FEBS Lett. 368, 257-262) and AFP-3EX (5 ' -CA(A,G)TT( A,G) AANTANC AN AAA( A,G) 
CACAT) based on the C-terminal sequence MCFCYFNC of DmAMPl) and genomic DNA 
isolated from D. merckii leaves, a 144 bp PCR product was produced and isolated from an 
agarose gel. The PCR product was cloned into pBluescript. The insert of 10 transformants 
were sequenced. The sequences represented 3 closely homologous DmAMPl-like genes one 
of which, PCR clone 4, encoded the observed mature DmAMPl. The 144 bp PCR product 
mixture labelled with 32 -P CTP was used to probe Hybond N (Amersham) filter lifts made 
from plates containing a total of 6 x 10 4 pfu of the cDNA library. Thirty potentially positive 
signals were observed. 22 plaques were picked and taken through two further rounds of 
screening. After in vivo excision 13 clones were characterised by DNA sequencing. 
Four classes of DmAMP related peptides were encoded by the 13 cDNA clones. Three 
versions of the DmAMP mature protein region were represented in the four classes. One of 
the classes (Dm2.5 type) contained a mature protein region which may correspond to 
DmAMP2 (Osborn R.W. et al, 1995, FEBS Lett. 368, 257-262). None of the cDNAs encoded 
a mature protein region equivalent to the observed mature DmAMPl peptide sequence. 
Using the sequence of PCR clone 4 (above) and information from the N- and C-terminal ends 
of the peptides deduced from cDNA sequences, two pairs of oligonucleotides were designed 
for amplification of a gene encoding DmAMPl. Genomic DNA from D. merckii was used in 
a PCR reaction with oligonucleotides MATAFP-5P (5'-ATGGC(C,G)AAN(A,C)(A,G)NTC 
(A.OGTTGCNTT) and MATAFP-5 (5'- A A AC AC ATGTGTTTCCC ATT) , the PCR product 
was cloned into pBluescript and clones were sequenced. A clone containing the 5' half of a 
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DmAMPl gene was identified. Genomic DNA from D. merckii was used in a PCR reaction 
with MATAFP-3 (5'- AGCGTGTCATGTGCGTAAT) and DM25MAT-3 (5'- TAAAGA 
AACCGACCCTTTCACGG), the PCR product was cloned into pBluescript and clones were 
sequenced. A clone containing the 3' half of a DmAMPl gene was identified. The 5' and 3' 
sections of the mature gene were combined to assemble the sequence of the coding region of 
the DmAMPl gene (Figure 1). 

The DmAMPl gene encodes a precursor with a 28 amino acids leader peptide, a 50 amino 
acids mature protein and a 40 amino acids C-terminal propeptide. The open reading frame is 
interrupted by a 92 bp intron located within the leader peptide region. 
To eliminate the intron from the DmAMPl gene sequence and to allow cloning of the 
DmAMPl encoding region, either with or without the C-terminal propeptide region, into an 
expression cassette vector, two PCR reactions were carried out with respectively the primer 
sets DMVEC-3 (5*- ATGCATCCATGGTGAATCGGTCGGTTGCGTTCTCCGCGTTCGTT 
CTGATCCTTTTCGTGCTCGCCATCTCAGATATCGCATCCGTTAGTGGAGAACTATG 
CGAGAAA) and DMVEC-2 (5'- AAACCGACCGAGCTCACGGATGTTCAACGTTTGGA 
AC), and DMVEC-3 and DMVEC4 (5'- AGCAAGCTTTTCGGGAGCTCAACAATTGA 
AGTAA). DMVEC-3 primes at the top strand of the DmAMPl gene, corresponds to the 
leader peptide region without the intron and introduces an Ncol site at the translation start. 
DMVEC-2 primes at the bottom strand of the DmAMPl gene at the 3 '-end of the C-terminal 
propeptide region and introduces a Sad site behind the translation stop codon. DMVEC-4 
primes at the bottom strand of the DMAMP1 gene at the 3' end of the mature protein region, 
fuses a stop codon behind this region and introduces a SacI site behind the stop codon. Both 
PCR products were cut with Ncol and Sad which cleaved the PCR products in two fragments 
due to an internal Ncol site in the mature protein region. The resulting Ncol-Sacl and Ncol- 
Ncol fragments were cloned sequentially in plasmid pMJB 1. pMJB 1 is an expression cassette 
vector containing in sequence a Hin&m site, the enhanced cauliflower mosaic 35S RNA 
(CaMV35S) promoter (Kay R. et al, 1987, Science 236, 1299-1302), a Xhol site, the 5' 
untranslated leader sequence of tobacco mosaic virus (TMV) (Gallie D.R. and Walbot V., 
1992, Nucl. Ac. Res. 20, 4631-4638) a polylinker including Ncol, Smal, Kpnl and Sacl sites, 
the 3' untranslated terminator region of the Agrobacterium tumefaciens nopaline synthase gene 
(Bevan M.W. et al, 1983, Nature 304, 184-187) and an EcoRl site. The resulting plasmids 
were termed pDMAMPE (leader peptide region, mature protein region and C-terminal 
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propeptide region) and pDMAMPD (leader peptide region and mature protein region), 
respectively. The coding regions were verified by DNA sequencing. 

Constructions of plant transformation vectors 

Schematic representations of the plant transformation vectors used in this work, pFAJ3105, 
pFAJ3106, pFAJ3107, pFAJ3108 and pFAJ3109, are shown in figures 3 till 7, respectively. 
The nucleotide sequences comprised between the Xhol and Sad sites of these plasmids, which 
encompass the regions encoding antimicrobial proteins, are presented in Figures 8 till 13. The 
regions comprised between the Xhol and Sacl sites of plasmid pFAJ3105 (shown in Figure 8) 
was constructed following the two-step recombinant PCR protocol of Pont-Kindom G.A.D. 
(1994, Biotechniques 16, 1010-1011). Primers OWB175 

(5'AGGAAGTTCATTTCATTTGG) and OWB278 (5 ' -GCCTTTGGC AC AACTTCTGT 

cctc<k:tccacgtcctctggggtagccacctcgtcagcagcgttggaacaattga 

AGTAACAGAAACAC) were used in a first PCR reaction with plasmid pDMAMPE (see 
above) as a template. The second PCR reaction was done using as a template plasmid pFRG4 
(Terras F.R.G. et al., 1995, Plant Cell 7, 573-588) and as primers a mixture of the PCR 
product of the first PCR reaction, primer OWB 175 and primer OWB 172 

(5 'TTAGAGCTCCT ATTAAC A AGG AAAGTAGC, Sacl site underlined). The resulting PCR 
product was digested with Xhol and Sacl and cloned into the expression cassette vector 
pMJBl (see above). The expression cassette in the resulting plasmid, called pFAJ3099, was 
digested with HindUL (flanking the 5' end of the CaMV35S promoter) and EcoKL (flanking the 
3' end of the nopaline synthase terminator) and cloned in the corresponding sites of the plant 
transformation vector pGPTVbar (Becker D. et al., 1992, Plant Mol. Biol. 20, 1 195-1 197) to 
yield plasmid pFAJ3105. 

Plasmids pFAJ3106, pFAJ3107 and pFAJ3108 were constructed analogously except that 
primer OWB278 in the first PCR reaction was replaced by the following primers, respectively: 
OWB279 (5 '-GCCTTTGGC AC A ACITCTGCCTCTTTCCGATGAGTTGTTCGGCTTT 
AAGTTTGTC); OWB303 (5 '-GCCTTTGGC AC AACTTCTGCCTCTTTCCG 
ATCGG ATGTTC AACGTTTGG AACC) ; OWB 304 (5 '-GCCTTTGGC AC AACTTCTGCCT 

CTTTCCGATAGTTTTGGTGGCAGCAACATCAGCTTGGTGATCCACAGTAGTACTGG 
CACAATTGAAGTAACAGAAACAC). 
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Plasmid pFAJ3109 was constructed by cloning the /fwdin-fcoRI fragment of plasmid 
pDMAMPD (see above) into the corresponding sites of plant transformation vector pGPTVbar 
(see above). 

Plant transformation 

Arabidopsis thaliana ecotype Columbia-O was transformed using recombinant Agrobacterium 
tumefaciens by the inflorescence infiltration method of Bechtold N. et al. (1993, C.R. Acad. 
Sci. 316, 1 194-1 199). Transformants were selected on a sand/perlite mixture subirrigated with 
water containing the herbicide Basta (Agrevo) at a final concentration of 5 mg/1 for the active 
ingredient phosphinothricin. 
Elisa assays and protein assays 

Antisera were raised in rabbits injected with either RsAFP2 (purified as described in Terras 
F.R.G. etal, 1992, J. Biol. Chem. 267, 15301-15309) orDmAMPl (purified as in Osborn 
R.W. et al., 1995, FEBS Lett. 368, 257-262). ELISA assays were set up as competitive type 
assays essentially as described by Penninckx I.A.M.A. et al. (1996, Plant Cell 8, 2309-2323). 
Coating of the ELISA microtiter plates was done with 50 ng/ml RsAFP2 or DmAMPl in 
coating buffer. Primary antisera were used as 1000- and 2000-fold diluted solutions 
(DmAMPl and RsAFP2, respectively) in 3 % (w/v) gelatin in PBS containing 0.05 % (v/v) 
Tween 20. 

Total protein content was determined according to Bradford (1976, Anal. Biochem. 72, 248- 
254) using bovine serum albumin as a standard. 

Rough separation of proteins processed from polyprotein precursors 

Arabidopsis leaves were homogenized under liquid nitrogen and extracted with a buffer 
consisting of 10 mM NaH 2 P0 4 , 15 raM Na 2 HP0 4 , 100 mM KC1, 1.5 M NaCl. The 
homogenate was heated for 10 min at 85°C and cooled down on ice. The heat-treated extract 
was centrifuged for 15 min at 15 000 x g and was injected on a reserved phase high pressure 
liquid chromatography column (RP-HPLC) consisting of C8 silica (0,46 cm x 25 cm; Rainin) 
equilibrated with 0.1 % (v/v) trifluoroacetic acid (TFA). The column was eluted at 1 ml/min 
in a linear gradient in 35 min from 15 % to 50 % (v/v) acetonitrile in 0. 1 % (v/v) TFA. The 
eluate was monitored for absorbance at 214 nm, collected as 1 ml fractions, evaporated and 
finally redissolved in water. The fractions were tested by ELISA assays. 
Preparation of extracellular fluid and intracellular extract 
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Intercellular fluid was collected from Arabidopsis leaves by immersing the leaves in a beaker 
containing extraction buffer (10 mM NaH 2 P0 4 , 15 mM Na 2 HP0 4 , 100 mM KC1, 1.5 M NaCl). 
The beaker with the leaves was placed in a vacuum chamber and subjected to six consecutive 
rounds of vacuum for 2 min followed by abrupt release of vacuum. The infiltrated leaves were 
gently placed in a centrifuge tube on a grid separated from the tube bottom. The intercellular 
fluid was collected from the bottom after centrifugation of the tubes for 15 min at 1800 x g. 
The leaves were resubjected to a second round of vacuum infiltration and centrifugation and 
the resulting (extracellular) fluid was combined with that obtained after the first vacuum 
infiltration. After this step the leaves were extracted in a Phastprep (BlOlOl/Savant) 
reciprocal shaker and the extract clarified by centrifugation (10 min at 10,000 x g) and the 
resulting supernatant considered as the intracellular extract. 
RESULTS 

Characterization of transgenic plants and expression analysis 

To explore the possibility of expressing polyprotein precursor genes in plants, four different 
plant transformation vectors were made with the aim to co-express two different cysteine-rich 
plant defensins with antifungal properties, namely RsAFP2 and DmAMPl. The polyprotein 
precursor regions of these constructs all featured a leader peptide region derived from the 
DmAMPl cDNA, the mature protein domain of DmAMPl, an internal propeptide region, and 
the mature protein domain of RsAFP2. The four constructs differed only in the internal 
propeptides (Figure 2): 

• construct 3105 has one of the IbAMP internal propeptides as a propeptide separating 
DmAMPl and RsAFP2. 

• construct 3 106 has a propeptide consisting of a part of the DmAMPl propeptide and a 
putative subtilisin-like protease processing site (IGKR) at its C-terminus. 

• construct 3107 is identical to construct 3106 except that the entire DmAMPl propeptide 
was taken. 

• construct 3 108 has a propeptide consisting of the AcAMP2 propeptide and a putative 
subtilisin-like protease processing site (IGKR) at its C-terminus. 

The rationale behind constructs 3106, 3107 and 3108 is based on our observations that the C- 
terminal propeptides of AcAMP2 and DmAMPl are cleaved off at their N-terminus when 
expressed as AcAMP2- and DmAMPl -preproproteins in tobacco, respectively, while this 
processing event does not detract the mature proteins from being sorted to the apoplast (De 
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Bolle et al., 1996, Plant Mol. Biol. 31, 993-1008; R.W. Osbom and S. Attenborough, personal 
communication). This infers that the processing enzymes are either in the secretory pathway 
or in the apoplast. On the other hand, C-terminal cleavage of the internal propeptide in these 
constructs should be executed by a subtilisin-like protease, a member of which in yeast (Kex2) 
is known to occur in the Golgi apparatus (Wilcox C.A. and Fuller R.S., 1991, J. Cell. Biol. 
1 15, 297), while a member in tomato occurs in the apoplast (Tomero P. et al., 1997, J. Biol. 
Chem. 272, 14412-14419). Proteins deposited in the apoplast, the preferred deposition site for 
antimicrobial proteins engineered in transgenic plants (Jongedijk E. et al., 1995, Euphytica 85, 
173-180; De Bolle et al., 1996, Plant Mol. Biol. 31, 993-1008) are normally synthesized via 
the secretory pathway, encompassing the Golgi apparatus. 

A construct was also made for expression of only DmAMPl (construct 3109, figure 7). 
Expression levels of DmAMPl and RsAFP2 were analysed in leaves taken from a series of Tl 
transgenic Arabidopsis plants resulting from transformation with the constructs described 
above. The results of the expression analyses based on Elisa assays are presented in Table 1. 
Most of the tested lines transformed with the polyprotein constructs 3105, 3106, 3107 and 
3108 clearly expressed both DmAMPl -CRPs (DmAMPl -crossreactive proteins) and RsAFP2- 
CRPs (Rs-AFP2-crossreactive proteins). There was generally a good correlation between 
DmAMPl -CRP and RsAFP2-CRP levels. However, the RsAFP2-CRP levels were generally 2 
to 5-fold lower than the DmAMPl -CRP levels. The Elisa assays for measuring the RsAFP2- 
CRPs in the extracts are, however, less reliable than those for the Dm-AMPl-CRPs. In Rs- 
AFP2 Elisa assays, dilutions of extracts of transgenic plants yielded dose-response curves that 
deviated from those obtained for dilutions of standard solutions containing authentic Rs-AFP2, 
indicating that the majority of the Rs-AFP2 -CRPs in the extracts were imunologically not 
identical to RsAFP2 itself. Deviations from RsAFP2 standard dose-response curves were 
much more pronounced for extracts from plants transformed with constructs 3106, 3107, and 
3 108 than for those of plants transformed with 3 105. None of the extracts showed deviations 
from Dm- AMP 1 standards in dose response curves in Dm- AMP 1 Elisa assays. The 
DmAMP-CRP levels in the lines transformed with the polyprotein constructs 3105, 3106, 
3 107 or 3 108 were generally much higher compared to those in the line transformed with the 
single protein construct 3109. This is also illustrated in Figure 13 where DmAMPl -CRP 
expression levels are compared for plants transformed with the polyprotein construct 3105 and 
plants transformed with the single protein construct 3 109. Expression levels as high as 4% of 
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total protein (e.g. DmAMPl-CRP level in lines 3105-15 and 3105 -18, see table 1) have so far 
never been reported in the literature for a peptide expressed in transgenic plants. Hence, the 
use of polyprotein constructs appears to result in markedly enhanced expression, which is an 
unexpected finding. 

Rough separation of proteins processed from polyprotein precursors 
A transgenic line was selected among each of the populations transformed with either 
construct 3105 (line 1) or 3106 (line 2) and the selected lines were further bred to obtain plants 
homozygous for the transgenes. In order to analyse whether DmAMPl and RsAFP2 were 
correctly processed in these lines, extracts from the plants were prepared as described in 
Materials and Methods and separated by RP-HPLC on a C8-silica column. Fractions were 
collected and assessed for presence of compounds cross-reacting with antibodies raised against 
either DmAMPl or RsAFP2 using Elisa assays. 

As shown in figure 15, DmAMPl- CRPs eluted at a position identical or very close to that of 
authentic DmAMPl in the line transformed with construct 3105 as well as in that transformed 
with construct 3 106. Likewise, RsAFP2-CRPs were detected in both the construct 3 105 and 
3106 lines at an elution position identical or very close to that of authentic RsAFP2. None of 
the fractions reacted with both the anti-DmAMPl and anti-RsAFP2 antibodies, indicating that 
an uncleaved fusion protein was not present in the extracts. No cross-reacting compounds 
were observed in a non-transformed line. 

It is concluded that the primary translation products of the transcription units of construct 3 105 
(lb AMP internal propeptide as linker peptide) and construct 3106 (partial DmAMPl C- 
terminal propeptide with subtilisin-like protease site as a linker peptide) are somehow 
processed to yield separate DmAMPl-CRPs and RsAFP2-CRPs that appear to be identical or 
very closely related to DmAMPl and RsAFP2, respectively, based on their chromatographic 
behavior. 

Analysis of the subcellular location of coexpressed plant defensins 

In order to determine whether the coexpressed plant defensins are either secreted 
extracellularly or deposited intracellularly, extracellular fluid and intracellular extract fractions 
were obtained from leaves of homozygous transgenic Arabidopsis lines transformed with 
either constructs 3105 (line 2), 3106 (line 2) or 3108 (line 12). The cytosolic enzyme glucose- 
6-phosphate dehydrogenase was used as a marker to detect contamination of the extracellular 
fluid fraction with intracellular components. As shown in Table 2, glucose-6-phosphate 
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dehydrogenase was partitioned in a ratio of about 80/20 between intracellular extract fractions 
and extracellular fluid fractions. In contrast, the majority of DmAMPl-CRP and RsAFP2- 
CRP content in all transgenic plants tested was found in the extracellular fluid fractions. 
These results indicate that both plant defensins released from the polyprotein precursors are 
deposited primarily in the apoplast. Hence, all processing steps that result in cleavage of the 
polyprotein structure must occur either in the apoplast or along the secretory pathway i.e. in 
the endoplasmic reticulum, the Golgi apparatus or in vesicles trafficking between Golgi and 
apoplast. 

Table l:Expression levels of Dm-AMPl and Rs-AFP2 in transgenic Arabidopsis lines 
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construct 



line 



Expression level 
of Dm-AMPl (%) 



expression level of 
Rs-AFP2(%) 



3105 



3106 



3107 



3108 



~3W9~ 



10 

IT 
"IT 

~13~ 

IT 

16 



10 

11 

12 



0,77 
1,13 
0,48 
0,005 
0,36 
0,99 
0,60 
0,13 
0,25 
4,15 
1,35 
0,24 
4,43 
1,18 
0,68 
0,49 

0,10 
1,82 
0,68 
1,15 
0,20 
0.10 
0,40 
2,64 
0,40 
0,21 
0,06 
0,24 
0,04 
0,75 
0,14 
0,01 
0,27 
0,47 
3,00 
0,91 
2,04 
0,17 
0,55 
0,16 
0.05 
0,45 
0,19 
0,05 
0.02 
0.20 
0.10 
0.06 
0.07 
0.003 
0,18 



0,29 
0,22 
0,20 

<0,001 
0,05 
0,25 
0,09 

<0,001 
0,08 
0,85 
0,35 
0,07 
0,91 
0,24 
0,17 
0,07 

0,001 
0,008 

0,20 

0,38 

0,10 

0,05 

0,17 

0,50 

0,15 

0,07 

0,03 

0.09 
0,04 
0,42 
0,13 
0,01 
0,29 
0,10 
0,53 
0,24 
0,22 
0,04 
0,05 
0,11 
0,02 
0,02 

nd 

nd 

nd 

nd 

nd 

nd 

nd 

nd 

nd 
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Table 2 : Relative abundance of glucose-6-phosphate dehydrogenase activity (GPD), 
DmAMPl and RsAFP2 in the extracellular fluid (EF) and intracellular extract (EE) fractions 
obtained from transgenic Arabidopsis plants. 

Construct Relative abundance 1 (%) of 

GPD DmAMPl RsAFP2 

EF EE EF EE EF ffi" 

pFAJ3105 17 83 93 7 92 8~~ 

pFAJ3106 17 83 94 6 60 40 

pFAJ3108 20 80 98 2 75 25 



1 Relative abundance is expressed as % of the sum of the contents in the EF and EE fractions. 
5 Purification of proteins processed from polyprotein precursor construct 3105 

Transgenic line 14 from the population transformed with construct 3105 was further bred to 
obtain plants homozygous for the trans gene. The DmAMPl -CRPs and RsAFP2-CRPs were 
purified by reversed phase chromatography from extracellular fluid prepared from leaves of 
this line. To this end, leaves were vacuum infiltrated with a buffer containing 50 mM MES 

10 (pH6) and a mixture of protease inhibitors (1 mM phenylmethylsulfonylfluoride, ImM N- 
ethylmaleimide, 5mM EDTA and 0.02 mM pepstatin A), and the extracellular fluid collected 
by centrifugation. Using this procedure homogenization and hence exposing DmAMPl - 
CRPs and RsAFP2-CRPs to compartimentalized proteases was avoided. The collected 
extracellular fluid was analyzed by RP-HPLC on a C8-silica column (Microsorb-MV, 4.6 x 

15 250 mm, Rainin) and the fractions tested for presence of DmAMPl -CRPs and RsAFP2- 
CRPs by Elisa using antibodies raised against DmAMPl and RsAFP2, respectively. The 
result of this analysis for the Arabidopsis transgenic line 14 transformed with construct 3105 
is shown in figure 15. DmAMPl -CRPs eluted in two peaks, the latter of which eluted at a 
position very close to that of authentic DmAMPl. RsAFP2-CRPs were found in a single 

20 peak that was well separated from the DmAMPl -CRP peaks and eluted at a position very 
close to that of authentic RsAFP2. None of the fractions reacted with both the anti- 
DmAMPl and anti-RsAFP2 antibodies, indicating that an uncleaved fusion protein was 
absent from the extracellular fluid. Based on comparison of the peak areas of the DmAMPl - 
CRPs and RsAFP2-CRPs with those of a series of standards consisting of authentic Dm- 
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AMP1 and RsAFP2, respectively, it was judged that the extract for the line transformed with 
construct 3105 contained about equal amounts of DmAMPl-CRPs and RsAFP2-CRPs. This 
indicates that cleavage of the polyprotein precursor in this line results in about equimolar 
amounts of DmAMPl-CRPs and RsAFP2-CRPs. Very similar chromatograms were 
obtained upon analysis of extracellular fluid prepared from transgenic line 2 (results not 
shown), indicating that the chromatographic pattern of DmAMPl-CRPs and RsAFP2-CRPs 
is independent from the transgenic line tested. 

To test whether the purification procedure based on extracellular fluid preparation reflects 
the true composition in DmAMP-CRPs and RsAFP2-CRPs of the transgenic Arabidopsis 
leaves, an alternative purification procedure was developed starting from a crude leaf extract. 
To this end, leaves were homogenized under liquid nitrogen and extracted with 50 mM MES 
(pH6) containing a mixture of protease inhibitors (1 mM phenylmethylsulfonylfluoride, 
ImM N-ethylmaleimide, 5mM EDTA and 0.02 mM pepstatin A). The homogenate was 
cleared by centrifugation (10 min at 10000 x g). The supernatant was then fractionated by 
ion exchange chromatography (TEC) and subsequently by reversed phase chromatography 
(RPC). After each separation, fractions were collected and assessed for DmAMP-CRPs and 
RsAFP2-CRPs using two different Elisa assays with antibodies raised against DmAMPl and 
RsAFP2, respectively. EEC was performed by passing the extract over a cation exchange 
column (Mono S, 5 x 50 mm, Pharmacia) at pH 6. When the column was eluted with a 
linear gradient of 0 to 0.5 M NaCl in 50 mM N-morpholino ethane sulfonic acid (MES) at 
pH 6, DmAMPl-CRPs were detected in fractions eluting between 0.17 and 0.33 M NaCl, 
while RsAFP2-CRPs eluted between 0.24 and 0.49 M NaCl. Fractions containing either 
DmAMPl-CRPs or RsAFP2-CRPs were pooled into two fractions (0.17 to 0.33 M NaCl; 
and 0.33 to 0.49 M NaCl) which were each subjected to RPC on a C8-silica column 
(Microsorb-MV, 4.6 x 250 mm, Rainin) eluted with a linear gradient of acetonitrile (Figure 
16). DmAMPl-CRPs eluted in two peaks, the latter of which eluted at a position very close 
to that of authentic DmAMPl. RsAFP2-CRPs were found in a single peak that was well 
separated from the DmAMP-CRP peaks and eluted at a position very close to that of 
authentic RsAFP2. Again, none of the fractions reacted with both the anti-DmAMPl and 
anti-RsAFP2 antibodies, indicating that an uncleaved fusion protein was not present in the 
extracts. 



PPD 50378/GB 




The different DmAMPl-CRPs and RsAFP2-CRPs purified from extracellular fluid were 
subjected to N-terminal amino acid sequence analysis (procedures as described in Cammue et 
aL, 1992, J. Biol. Chem., 2228-2233) as well as to MALDI-TOF (matrix-assisted laser 
desorption ionization-time of flight) mass spectrometry (Mann and Talbo, 1996, Curr. 

5 Opinion Biotechnol. 7, 1 1-19). The C-terminal amino acid was determined based on the best 
approximation of the predicted theoretical mass by the experimentally determined mass 
(Table 3). Both the minor DmAMPl-CRPs, p3105EFl, and the major DmAMPl -CRP, 
p3105EF2 (protein codes as in figure 15 and table 3), had exactly the same N-terminal 
sequence as mature DmAMPl. p3105EFl and p3105EF2 had masses that were consistent 

10 with the presence of a single additional serine residue at their C-terminal end compared to 
authentic DmAMPl. However, while the mass of p3105EF2 corresponded exactly (within 
experimental error) to that calculated for a DmAMPl derivative with a C-terminal serine 
(hereafter called DmAMPl+S), that of p3105EFl was in excess by about 8 dalton relative to 
the calculated mass for DmAMPl+S. Hence, this protein might be a DmAMPl+S derivative 

15 with reduced disulfide bridges. The RsAFP2-CRP fraction p3105EF3 represents, based on 
N-terminal sequence and mass data, an RsAFP2 derivative with the additional pentapeptide 
sequence DVEPG at its N-terminus. This protein is further referred to as DVEPG+RsAFP2. 
The different DmAMPl-CRPs and RsAFP2-CRPs purified from total leaf extract were 
analyzed in the same way. The analyses indicated that the same molecular species were 

20 present in the total leaf extract, i.e. DmAMPl+S, a putatively reduced form of DmAMPl+S, 
and DVEPG+RsAFP2 (table 3). 

The purified fractions containing the major processing products, DmAMPl+S and 
DVPEG+RsAFP2 respectively, were subjected to an antimicrobial activity test using the 
fungus Fusarium culmorum according to the procedure outlined by Cammue et aL (1992, J. 

25 Biol. Chem. 267, 2228-2233). The specific antimicrobial activity, expressed as protein 
concentration required for 50 % growth inhibition of the test organism, of purified 
DmAMPl+S was identical to that of authentic DmAMPl. The specific antimicrobial 
activity of purified DVPEG+RsAFP2 was about 2-fold lower relative to that of authentic 
RsAFP2. The slight drop in specific antimicrobial activity of DVPEG+RsAFP2 is most 

30 likely due to the presence of 5 additional N-terminal amino acids. Nevertheless, our data 
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prove that processing of the polyprotein precursors in transgenic plants can result in the 
release of bioactive proteins. 



Analysis of the AFPs produced in transgenic plants transformed with construct 3105 reveals 
that the precursor is apparently processed by three cleavage steps (Figure 17): 

(i) the precursor is cleaved at the C-terminal end of the leader peptide in the same way as for 
the authentic DmAMPl precursor; (ii) the precursor is cleaved at the C-terminal end of the 
first amino acid of the linker peptide, thus releasing DmAMPl +S; (iii) the precursor is 
further processed at the N-terminal end of the fifth last residue of the linker peptide, thus 
releasing DVEPG+RsAFP2. It is not known which proteases effect the observed cleavages, 
nor how many different proteases are involved. Cleavages in the linker peptides might 
involve only endoproteinases or result from the coordinated action of endoproteinases and 
exopeptidases that further trim the cleavage products at their ends. Processing at the C- 
terminal side of the linker peptide occurs between the two acidic residues E and D. The 
acidic doublet might be a target sequence for a specific endoproteinase. An aspartic 
endoproteinase that is able to cleave between two consecutive acidic residues has previously 
been purified from Arabidopsis seeds (D'Hondt et al. 1993, J. Biol. Chem. 268, 20884- 
20891). It is worthwhile to mention that the sequence ED occurs at the very C-terminal end 
in five out of six internal propeptides of the IbAMPl polyprotein precursor (Tailor et al. 
1997, J. Biol. Chem. 272, 24480-24487). In one of the six internal lb AMP propeptides, more 
precisely the one that was used in construct 3105, the ED sequence does not occur at the C- 
terminal end of the propeptides but is separated by 4 amino acids from this end. Processing 
of this propeptide in Impatiens balsamina might involve cleavage of the ED sequence 
followed by partial N-terminal trimming of the resulting protein by an aminopeptidease. We 
predict that an internal propeptide resembling the IbAMPl propeptide used in construct 3105 
but in which the ED dipeptidic sequence is moved to the C-terminal end of the propeptide, 
would result in a cleavage product with only one or no extra N-terminal amino acids in the 
protein located C-terminally from the internal propeptide. Alternatively, another IbAMPl 
propeptide which already has an ED sequence at its C-terminal end (Tailor et al., 1997, J. 
Biol. Chem. 272, 24480-24487) or a related sequence might give a similar improvement of 
processing accuracy. 
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Table 3: Mass determined by MALDI-TOF and N-terminal sequence determined by 
automated Edman degradation of DmAMPl-CRP and RsAFP2-CRP fractions 
purified as described in Figures 15 and 16. Also shown are the predicted C- 
terminal sequence that gives best correspondence between experimental mass and 
5 theoretical mass. 



Construct 


Protein 


Mass 


Determined 


Predicted C- 


Theoretical 




fraction 


determined 


N-terminal 


terminal 


mass for 




(Figures 15 


by MALDI- 


sequence 


sequence 


predicted 




and 16) 


TOF 






sequence 


3105 


P3105EF1 


56 14 ±5 


ELCEKAS 


CYFNCS 


5604.25 




P310EF2 


5602 ± 5 


ELCEKAS 


CYFNCS 


5604.25 




P3105EF3 


6223 ±6 


DVEPGQK 


ICYFPC 


6225.15 




P3105TE1 


5610±5 


ELCEKAS 


CYFNPS 


5604.25 




P3105TE2 


5604 + 5 


ELCEKAS 


CYFNCS 


5604.25 




P3105TE3 


6224 ±6 


DVEPGQK 


ICYFPC 


6225.15 



PFAJ3105 
10 Xhol 

CTCGAGTATTTTTACAACAATTACCAACAACAACAAACAACAAACAACATTACAATTACT 

NcoX 

ATTTACAATTACACCATGGTGAATCGGTCGGTTGCGTTCTCCGCGTTCGTTCTGATCCTT 

MVNR SVAF SAFVLIL 

15 

TTCGTGCTCGCCATCTCAGATATCGCATCCGTTAGTGGAGAACTATGCGAGAAAGCTAGC 
FVLAISDIASVSG ELCEKAS 

AAGACGTGGTCGGGCAACTGTGGCAACACGGGACATTGTGACAACCAATGTAAATCATGG 
20 KTWSGNCGNTGHCDNQCKSW 

GAGGGTGCGGCCCATGGAGCGTGTCATGTGCGTAACGGGAAACACATGTGTTTCTGTTAC 
EGAAHGACHVRNGKHMCFCY 

25 TTCAATTGTTCCAACGCTGCTGACGAGGTGGCTACCCCAGAGGACGTGGAGCCAGGACAG 
F N C SNAADEVATPEDVEPGQ 




AAGTTGTGCCAAAGGCCAAGTGGGACATGGTCAGGAGTCTGTGGAAACAATAACGCATGC 
KLCQRPSGTW SGVrGNMMan 

AAGAATCAGTGCATTAGACTTGAGAAAGCACGACATGGATCTTGCAACTATGTCTTCCCA 
KN0CIRr -EKA RHG.q C NYV FP 

Sad 



GCTCACAAGTGTATCTGCTACTTTCCTTGTTAATAGGAGCTC 
AHKCICYFpp _ _ 

10 PFAJ3106 
Xhol 

CTCGAGTATTTTTACAACAATTACCAACAACAACAAACAACAAACAACATTACAATTACT 

Ncol 

15 ATTTACAATTACACCATGGTGAATCGGTCGGTTGCGTTCTCCGCGTTCGTTCTGATCCTT 

MVNRSVAFSAFVLIL 

TTCGTGCTCGCCATCTCAGATATCGCATCCGTTAGTGGAGAACTATGCGAGAAAGCTAGC 

FVLAISDIASVSG E L C E K A fi 
20 ' 

AAGACGTGGTCGGGCAACTGTGGCAACACGGGACATTGTGACAACCAATGTAAATCATGG 
KTWSGNCGNT GHCDNQCKSW 

GAGGGTGCGGCCCATGGAGCGTGTCATGTGCGTAACGGGAAACACATGTGTTTCTGTTAC 
25 EGAAHGA CHVRNG KHMCFCY 

TTCAATTGTAAAAAAGCCGAAAAGCTTGCTCAAGACAAACTTAAAGCCGAACAACTCATC 
F N C K KAEKLAQDKLKAEQLI 

30 GGAAAGAGGCAGAAGTTGTGCCAAAGGCCAAGTGGGACATGGTCAGGAGTCTGTGGAAAC 
° K R Q K LCORPc ; GTw qG , vrr M 

AATAACGCATGCAAGAATCAGTGCATTAGACTTGAGAAAGCACGACATGGATCTTGCAAC 
NNACKNOCTPr. EtraRHG SrM 

35 " ~ 

Sac I 
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TATGTCTTCCCAGCTCACAAGTGTATCTGCTACTTTCCTTGTTAATAGGAGCTC 
YVFPAHKCICYFPC - - 

PFAJ3107 

Xhol 

CTCGAGTATTTTTACAACAATTACCAACAACAACAAACAACAAACAACATTACAATTACT 

Ncol 

ATTTACAATTACACCATGGTGAATCGGTCGGTTGCGTTCTCCGCGTTCGTTCTGATCCTT 

MVNRSVAF SAFVLIL 

TTCGTGCTCGCCATCTCAGATATCGCATCCGTTAGTGGAGAACTATGCGAGAAAGCTAGC 
FVLAISDIASVSG E L C E K A S 

AAGACGTGGTCGGGCAACTGTGGCAACACGGGACATTGTGACAACCAATGTAAATCATGG 
KTWSGNCGNTGHCDNQCKSW 

GAGGGTGCGGCCCATGGAGCGTGTCATGTGCGTAACGGGAAACACATGTGTTTCTGTTAC 
EGAAHGACHVRNGKHMCFCY 

TTCAATTGTAAAAAAGCCGAAAAGCTTGCTCAAGACAAACTTAAAGCCGAACAACTCGCT 
F N C KKAEKIiAQDKL KAEQIiA 

CAAGACAAACTTAATGCCCAAAAGCTTGACCGTGATGCCAAGAAAGTGGTTCCAAACGTT 
QDKLNAQKLDRDAKKVVPNV 

GAACATCCGATCGGAAAGAGGCAGAAGTTGTGCCAAAGGCCAAGTGGGACATGGTCAGGA 
E H P I G K R QKLCORPSGTWSG 

GTCTGTGGAAACAATAACGCATGCAAGAATCAGTGCATTAGACTTGAGAAAGCACGACAT 
VCGNNNACKNQC I RL EKARH 

GGATCTTGCAACTATGTCTTCCCAGCTCACAAGTGTATCTGCTACTTTCCTTGTTAATAG 
GSCNYVFPAHKCICYFPC - 



SacI 
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GAGCTC 
PFAJ3108 
5 Xhol 

CTCGAGTATTTTTACAACAATTACCAACAACAACAAACAACAAAC 



UTAACATTACAATTACT 
Ncol 



10 



ATTTACAATTACACCATGGTGAATCGGTCGGTTGCGTTCTCCGCGTTCGTTCTGATCCTT 

MVNRSVAFSAFVLIL 

TTCGTGCTCGCCATCTCAGATATCGCATCCGTTAGTGGAGAACTATGCGAGAAAGCTAGC 
FVLAI SDIASVSG E L C E K A S 



AAGACGTGGTCGGGCAACTGTGGCAACACGGGACATTGTGACAACCAATGTAAATCATGG 

S W 



15 KTWSGNCGNTGHCDNOr K 



GAGGGTGCGGCCCATGGAGCGTGTCATGTGCGTAACGGGAAACACATGTGTTTCTGTTAC 
_EGAAHGACH V RNGKHMCFCY 

20 TTCAATTGTGCCAGTACTACTGTGGATCACCAAGCTGATGTTGCTGCCACCAAAACTATC 



F N C ASTT VDHQADVAAT 



K T I 



25 



GGAAAGAGGCAGAAGTTGTGCCAAAGGCCAAGTGGGACATGGTCAGGAGTCTGTGGAAAC 

N 



G K R 0KLCORP C: GTW gGvp „ 



AATAACGCATGCAAGAATCAGTGCATTAGACTTGAGAAAGCACGACATGGATCTTGCAAC 
-NNACKNOCTPr... KAPRG qpM 

Sac I 

TATGTCTTCCCAGCTCACAAGTGTATCTGCTACTTTCCTTGTTAATAGGAGCTC 
30 YVF PAHKCTPYFPr _ 

PFAJ3109 



Xhol 

35 



CTCGAGTATTTTTACAACAATTACCAACAACAACAAACAACAAJ 



ACAAC ATTAC AATTAC T 
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Ncol 

ATTTACAATTACACCATGGTGAATCGGTCGGTTGCGTTCTCCGCGTTCGTTCTGATCCTT 

MVNRSVAFSAFVLIL 

TTCGTGCTCGCCATCTCAGATATCGCATCCGTTAGTGGAGAACTATGCGAGAAAGCTAGC 
FVLAISDIASVSG E L C E K A S 

AAGACGTGGTCGGGCAACTGTGGCAACACGGGACATTGTGACAACCAATGTAAATCATGG 
KTWSGNCGNTGHCDNQCKSW 

GAGGGTGCGGCCCATGGAGCGTGTCATGTGCGTAATGGGAAACACATGTGTTTCTGTTAC 

EGAAHGACHVRNGKHMCFCY 
Sad 

TTCAATTGTTGAGCTC 
F N C 
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CLAIMS 

A method for the expression of multiple proteins in a transgenic plant comprising 
inserting into the genome of said plant a DNA sequence comprising a promoter region 
operably linked to a signal sequence said signal sequence being operably linked to two 
or more protein encoding regions and a 3 '-terminator region wherein said protein 
encoding regions are separated from each other by a DNA sequence coding for a linker 
propeptide said propeptide providing a cleavage site whereby the expressed polyprotein 
is post-translationally processed into the component protein molecules. 

A method of improving expression levels of one or more proteins in a transgenic plant 
comprising inserting into the genome of said plant a DNA sequence comprising a 
promoter region operably linked to two or more protein encoding regions and a 3'- 
terminator region wherein said protein encoding regions are separated from each other 
by a DNA sequence coding for a linker propeptide said propeptide providing a 
cleavage site whereby the expressed polyprotein is post-translationally processed into 
the component protein molecules. 

A method for improving expression levels of one or more proteins in a transgenic plant 
according to claim 2 wherein said promoter region is operably linked to a signal 
sequence said signal sequence being operably linked to two or more protein encoding 
regions and a 3 '-terminator region wherein said protein encoding regions are separated 
from each other by a DNA sequence coding for a linker propeptide said propeptide 
providing a cleavage site whereby the expressed polyprotein is post-translationally 
processed into the component protein molecules. 

A method according to any of the preceding claims wherein at least 40% of the 
sequence of said linker propeptide consists of stretches of either two to five 
consecutive hydrophobic residues selected from alanine, valine, isoleucine, 
methionine, leucine, phenylalanine, tryptophan and tyrosine or stretches of two to five 
hydrophilic residues selected from aspartic acid, glutamic acid, lysine, arginine, 
histidine, serine, threonine, glutamine and asparagine. 
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5. A method according to any of the preceding claims wherein said linker propeptide has 
within 7 residues of its N- or C- terminal cleavage site a sequence with two to Five 
consecutive acidic residues, two to five basic residues or two to five consecutive 
intermixed acidic and basic residues. 

6. A method according to any of the preceding claims wherein the DNA sequence 
encoding said linker propeptide encodes a propeptide isolatable from a plant protein. 

7. A method according to claim 6 wherein the plant protein is a precursor of a plant 
defensin, or a hevein-type antimicrobial protein . 

8. A method according to claim 6 wherein the plant protein is an antimicrobial protein 
derived from the genus Impatiens. 

9. A method according to claim 7 wherein the propeptide is a C-terminal propeptide from 
Dm- AMP 1 or Ac-AMP2 as described in Figure 2. 

10. A method according to claim 8 wherein the propeptide is isolatable from the Ib-AMP 
precursor or the Ib-AMP precursor as described in Fig2. 



11. 



A method according to any of the preceeding claims wherein the linker propeptide has 
a protease processing site engineered at either or both ends thereof. 



12. A method according to claim 1 1 wherein the protease processing site is a subtilisin - 
like protease processing site. 

13. A method according to any of claims 1 and 3 to 5 wherein the signal sequence is 
derived from a plant defensin gene. 

14. A method according to any of the preceding claims wherein one or more of the 
multiple proteins is a defense protein. 
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Use of propeptides derived from plant derived proteins as cleavable linkers in 
polyprotein precursors synthesized via the secretory pathway in transgenic plants. 

Use of a propeptide according to claim 15 wherein the protein is a precursor of a plant 
defensin, or a hevein-type antimicrobial protein or is isolatable from the genus 
Impatiens. 



Use of a propeptide as a cleavable linker in polyprotein precursors synthesized via the 
secretory pathway in transgenic plants wherein said propeptide linker is as defined in 
claim 4 or claim 5. 



Use of a propeptide sequence rich in the small amino acids A, V, S and T and 
containing dipeptidic sequences consisting of either two acidic residues, two basic 
residues or one acidic and one basic residue as a cleavable linker sequence wherein 
said sequence is isolatable from a plant defensin or a hevein-type antimicrobial peptide. 

A DNA construct comprising a DNA sequence comprising a promoter region operably 
linked to a plant derived signal sequence said signal sequence being operably linked to 
two or more protein encoding regions and a 3' terminator-region wherein said protein 
encoding regions are separated from each other by a DNA sequence coding for a linker 
propeptide said propeptide providing a post-translational cleavage site. 

A DNA construct comprising a DNA sequence comprising a promoter region operably 
linked to two or more protein encoding regions and a 3* terminator-region wherein said 
protein encoding regions are separated from each other by a DNA sequence coding for 
a linker propeptide encoding a C-terminal propeptide from the Dm- AMP gene or from 
the Ac-AMP gene said propeptide providing a post-translational cleavage site 

A DNA construct according to claim 19 or claim 20 wherein the DNA sequence 
encoding the linker propeptide additionally comprises one or more protease recognition 
sites at either or both ends thereof. 
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22. A vector comprising a DNA construct according to any of claims 19 to 21. 

23. A transgenic plant transformed with a DNA construct or a vector according to claim 
19 to 22. 
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PB1N19 backbone (including oriRK2 and nptll) 



Symbols 

RB: right border of T-DNA 

Tnos: terminator of T-DNA nopaline synthase gene 
MP Rs-AFP2: mature protein domain of Rs-AFP2 
LP: Ib-AMP internal propeptide 

MP Dm-AMP1: mature protein domain of Dm-AMP1 cDNA 
SP Dm-AMP1: signal peptide domain of Dm-AMP1 cDNA 
TMV: tobacco mosaic virus 5' leader sequence 

PnosS RN ^ ° f M ^ flower virus with duplicated enhancer region 

i-'nos. promotor of T-DNA nopaline synthase gene 

bar: basta resistance encoding gene 

Tg7: terminator of T-DNA gene 7 

LB: left border of T-DNA 

*: unique restriction site 
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pBIN19 backbone (including oriRK2 and nptll) 



Symbols 



RB: right border of T-DNA 

Tnos: terminator of T-DNA nopaline synthase gene 
MP Rs-AFP2: mature protein domain of Rs-AFP2 

tS £T St l?/£ of Dm ' AMP1 ^-terminal propeptide and subtilisin-like protease recognition site IGKR 
MP Dm-AMPI: mature protein domain of Dm-AMP1 cDNA 
SP Dm-AMP1 : signal peptide domain of Dm-AMP1 cDNA 
| TMV: tobacco mosaic virus 5* leader sequence 
Penh35S: promotor of 35S RNA of cauliflower mosaic virus with duplicated enhancer reqion 
Pnos: promotor of T-DNA nopaline synthase gene 
bar: basta resistance encoding gene 
Tg7: terminator of T-DNA gene 7 
LB: left border of T-DNA 
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pBIN19 backbone (including oriRK2 and nptll) 



Symbols 

RB: right border of T-DNA 

Tnos: terminator of T-DNA nopaiine synthase gene 
MP Rs-AFP2: mature protein domain of Rs-AFP2 

kad n™"™ 1 C - terminal propeptidedomain and subtilisin-like protease recognition site IGKR 
MP Dm-AMP1 : mature protein domain of Dm-AMP1 cDNA 
JSP Dm-AMP1; signal peptide domain of Dm-AMP1 cDNA 
TMV: tobacco mosaic virus 5' leader sequence 

Penh35S: promotor of 35S RNA of cauliflower mosaic virus with duplicated enhancer reqion 
Pnos: promotor of T-DNA nopaiine synthase gene 
bar: basta resistance encoding gene 
Tg7: terminator of T-DNA gene 7 
LB: left border of T-DNA 

*: unique restriction site 
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Symbols 

RB: right border of T-DNA 

Tnos: terminator of T-DNA nopaline synthase gene 
MP Rs-AFP2: mature protein domain of Rs-AFP2 
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Tg7: terminator of T-DNA gene 7 
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Symbols 

RB: right border of T-DNA 

Tnos: terminator of T-DNA nopaline synthase gene 

qd n m ~ AMP1: mature protein domain of Dm-AMP1 
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Figure 16: Amino acid sequence of the polyprotein precursors encoded by construct 
3105. Dashes indicate omission from the full sequence for sake of brevity. 
The sequence in italic is the DmAMPl leader peptide, the underlined 
sequence is mature DmAMPl, the bold sequence is the internal propeptide, 
the double underlined sequence is mature RsAFP2. Arrows indicate 
processing sites according to the N-terminal sequence and MALDI-TOF 
analyses of purified DmAMP-CRPs and RsAFP2-CRPs. 
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